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Chapter 1 - I ntroduct ion 



This investigation is concerned with the problem of structuring 
sequences of instructional stimuli such that learning Is optimized. 
The particular type of learning considered is that of "pa i red-assoc- 
iates'', where one "trial" of a st imul us- response pair consists of 
the presentation of the stimulus member of the pair, followed by the 
subject's response, followed by presentation of the response member 
for reinforcement. Learning a list of foreign language vocabulary 
pairs in this manner can be thought of as an example of paired- 
associate learning. The optimization of learning, in the sense 
considered in this investigation, can take the form either of maxi- 
mizing learning for a specified number of trials, or of minimizing 
the number of trials necessary to achieve a specified level of learn- 
ing. The quantitative evaluation of "level of learning" takes the 
form, in most cases, of an expected value of a test score obtained 
after learning has taken place. 

The optimization problem, for the purposes of this investigation, 
was considered abstractly as a sequential decision process with an 
imberHed mathematical model of learning providing a criterion func- 
tion. The problem investigated can be expressed as follows: "Given 
a mathematical model of paired-associate learning and a set, S, of 
stimulus-response pairs to be learned, which element of S should be 
selected for presentation at each trial so that either learning is 
maximized for a given number of trials^ m, or the number of trials 
necessary to achieve a given level of learning is minimized?" The 

sequence of presentations, (s-, S-, s ), thus obtained will be 

I L m 

referred to as the optimal presentation strategy for the given se- 
quential decision problem. 

It should be emphasized at this point that the primary orienta- 
tion of the research was toward the investigation of techniques of 
solution, and particularly computer-oriented techniques, for the 
abstract optimization problem just stated, as opposed to any invest- 
igation of the psychological relevance of the processes. 

A few general, but hopefully not very restrictive (in terms of 
psychological relevancy) assumptions are made concerning the frame- 
work of the sequential decision problem. First, it is assumed that 
the presentation strategy can be either response- i nsensi t ive or i'e- 
sponse-sensi t i ve, depending on the model of learning used. Secondly, 
it is assumed that the "state" of the model, in the form of a stc/te 
vector whose components consist of the probabilities of incorrect 
response, appropriately quantized, for each of the elements of the 
stimulus set, S, can be explicitly determined at each stage of the 
sequential decision process. For response-sensitive strategies, 
this determination will, of course, depend on the actual or simulated 
response history or the subject. It is further assumed that the 
effect on the bcate of the model of selecting a particular stimulus 
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for presentation can be determined at each stage of the process, in 
terms of either explicit or expected changes in probabilities of in- 
correct response. It is assumed that in the case of respon^^e-sens i - 
tive strategies, where expected values of change in state must be 
used, that only two responses are possible, namely "correct" and 
" I ncorrect". 

Figures 1 and 2 illustrate, in terms of the framework just de- 
scribed, the sequential decision processes applicable to learning 
models which correspond to deterministic and non-deterministic state 
transitions, respectively. Response- i nsens i t i ve strategies may cor- 
respond to deterministic or non-deterministic transitions, depending 
on the learning model used, while response-sensitive strategies will 
generally correspond to non-deterministic transitions. Although 
certain of the learning models used imply further restrictions, such 
ar, non-interaction of stimuli, the general framework proposed for 
the problem necessitates no further restrictions. 

The sequential decision process for models imposing deterministic 
state transitions '-s illustrated in Figure 1. It is assumed that :he 
model is initially in some arbitrary state, Ci . . State Q. . is defined 
as follows: 'J 

Q.. .<,.('.), .... q.("'> („ 



where 



(r) 

q.j = probability of incorrect response to rth stimulus 



The n transitions emanating from this initial state (Node 01) indicate 
that there are n poss'hje stimuli to choose from for the first trial 
and, in general, n di" arent possible state transitions, depending on 
the choice. The transitions will be determined by the imbedded mode) 
of learning. Although most of the particular learning models used 
imply independence of stimuli (i.e., each component of the state vec- 
tor is a function only of the presentation of the corresponding stim- 
ulus), the decision process has been deliberately formulated to allow 
dependence of each component on al 1 presentations and, by implication, 
on time (to include memory effects) • Stage 1 illustrates the n new 
states which can result, corresponding to each of the n stimuli, if 
chosen for presentation- In general, these states will be different, 
although one or more of them could conceivably he identical with the 
initial state. Stage 2 continues the process definition by illustrat- 
ing all possible state transitions from each of the possible states 

at Stage 1, A dotted transition Is shown between Q,. and Q^. to il- 

I n 2 I 

lustrate that, In general, any node in the graph below the initial 
node need not have a unique predecessor. The number of states at 
any stage will be less than or equal to the number of states in the 
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state space, namely z , where z is the number of quantization levels. 

An optimal presentation strategy for the process Illustrated in 
Figure 1 Is defined as any sequence of presentations, '^^2' ' ' ' *'^m^ ' 

where each P. is chosen from the set (s-,s^,...,s ), which maximizes 
I \ Z n 

an expected test score after m trials. The expected test score in 

this case is defined as follows: 

E{T .} = (100/n) E d-q^"') = 100 - (100/n) E q^^) (2) 
nnj mj mj 



where the q^^. are the components of the j th state of stage m. 

As illustrated in Figure 2, the sequential decision process is 
essentially the same when the state transitions are non-determi n i s ci c , 
except that more than one transition is possible as the result of the 
presentation of a particular stimulus. The particular framework il- 
lustrated, with two possible transitions for each presentation, is 
applicable either to dichotomous response-sensitive models, such as 
the one-element model with correct and incorrect responses, or to 
two-valued stochastic-increment models, such as the random-trial 
increments (RTl) model. The labelling of Figure 2 denotes correct 
(C) and incorrect (I) responses corresponding to the presentation of 
each possible stimulus. It is, of course, true that P{C}+P{l}=i- 
The labelling could just as well correspond to "increment" (l) and 
"no increment" (C) for the RTl model. The other d'rference inherent 
in the non-deterministic process illustrated by Figure 2 is that, in 
the normal usage, optimization makes sense only in the context of an 
expected value of E{T^j}. !n other words, for the deterministic pro- 
cess, given the same initial state and the same model of learning, 
the same presentation strategy will always be optimal; for the non- 
deterministic process, on the other hand, the best that can be done, 
a priori, is to specify an algorithm which will guarantee, on the 
average , the optimal value of E{T .}. 

The remainder of this report will be concerned with the investi- 
gation of different approaches to the problem of optimal instruction 
sequencing formulated as one of the sequential decision processes of 
the type illustrated by Figures 1 and 2. This formulation includes 
as special cases several previous investigations reported by other 
authors. These investigations will be commented upon at the appro- 
priate point in the report. The Methods section of the report dis- 
cusses the theoretical and experimental approaches taken to the 
problem and generally follows the organization of the Results sec- 
tion. Part A of the Results section (Chapter III) is concerned with 
exhaustive (globally optimal) optimization methods, such as Dynamic 
Programming, and includes comments on previous investigations in- 
volving this app.oach. Chapter I I l-B discusses algorithmic methods, 
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includfng the specification of an optimal algorithm applicable to 
a class of learning models which includes the single-operator linear 
model, the one-element model, and the RT I model as special cases. 
Also included in this section are the results of a number of Monte 
Carlo simulations designed to determine the efficacy of the optimal 
algorithm relative to standard cyclical or random presentation strat- 
egies. Chapter I I l-C outlines a possible heuristic approach to the 
optimization problem for more complex learning models which cannot be 
optimized a 1 gor i thmi ca 1 1 y , A primary advantage of this heuristic 
state-space search approach is that it provides a means of overcoming 
the difficulties of dimensionality inherent in methods such as Dynamic 
Programming for problems of even moderately realistic complexity. The 
Conclusions section includes an evaluation of the optimization methods 
proposed and suggestions for further research. 
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Chapter I I - Methods 



The startinc poiiiC for the initial research plan was an investi- 
gation of the app I ! cab I J J ty and limitations of Dynamic Programming 
(Bellman, 1957, 1961; Bel man & Dreyfus, 1962) in the solution of 
the generol optimization problem described in the Introduction. 
Dynamic Programming approaches to problems of this type have been 
taken or suggested by various researchers, including Smallwood (I962), 
rtcjtheson (196^), Groen and Atkinson (1966), Smallwood (1971), and 
Lalfee (1970). The approach taken in this investigation was to 
attempt to determine general criteria of applicability of Dynamic 
Programminn to problems of the type described, and to outline practi- 
cal limitations of this method. In addition, dimensionality reddc- 
tlon techniques and modified forms of Dynamic Programming, such as 
State- i ncrement Dynamic Programming (Larson, I968), were investigated 
for their potential in Increasing the practical applicability of this 
form of solution. 

In the second phase of the research, algorithmic optimization 
methods were investigated (i.e., methods by which the optimal strate- 
gy can be specified outright, rather than reconstructed by means of 
search techniques), Monte Carlo simulations of the instructional 
process for several such algorithmic methods were conducted for the 
purpose of determining the theoretical effectiveness of these methods. 
Included in this work was a simulation of the experiment conducted by 
Dear, et al. (1965), which was designed tc test an optimal presenta-- 
tion strategy based on the one-element model of learning. The purpose 
of the simulation was to answer some questions brought out by their 
study and to attempt to obtain more substantive verification of their 
conclusions. The method used was straightforward repetitive stochastic 
simulation with sample sizes determined in part by tolerance crfteria 
on the variance of the sample means. The simulations were conducted 
on a PDP-15/^O computer using an additive pceudo- random number gener- 
ation scheme, and on an IBM 370/155 computer using the SSP power res- 
idue method. Common runs of certain cases were made using both ma- 
chines to detect bias in the results. As the simulation programs 
used were fairly short and straightforward, representative examples 
are included for reference in the Appendix, along with verification 
data for the pseudo- random number generation schemes and programs 
to determine confidence levels on the sample means. 

The final phase of the research was concerned with the investi- 
gation of optimization methods suitable for problems not apparently 
amenable to algorithmic or Dynamic Programming solution. It is anti- 
cipated that one source of such problems will be learning models 
which allow for general stimulus interaction (and by implication, 
memory of a sort). The approach taken was to investigate the appli- 
cability of certain heuristic state-space search methods developed 
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In the field of Artificial Intelligence (cf. Nilsson, 1971; Dreyfus, 
1969; Hart, et al., I968; Pohl, I969). Since presently there are 
apparently no generally accepted learning mode's of the type neces- 
sitating such an approach, an attempt was made to formulate the 
heuristic methods in terms of a general class of learning models 
which would contain certain anticipated interactive features. As 
such, the heuristic methods proposed serve primarily an illustrative 
purpose. The heuristic solution paradigm will need to be refined by 
further research as more appropriately complex learning models are 
developed. 
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Chapter I I I - Resul ts 



A. Globally Optimal Search Techniques 

Several Investigators have proposed the use of Dynamic Programming 
in the solution of optimization problems similar to the problem out- 
lined in the Introduction of this report. Dynamic Programming is the 
pricipal globally optimal search technique discussed in this section, 
but is by no means the only such technique. Smallwood (1962) is gen- 
erally crecMted with making the first application of a Dynamic Pro- 
gramming type of solution to an instructional sequencing problem, al- 
though the problem examined is somewhat different and more specific 
than the problem considered in this report. A later investigation by 
Smallwood (I97l) involved the application of Dynamic Programming to 
the solution of an instruction sequencing problem which included cost 
of instruction as an additional criterion. This problem falls more 
within the framework of classical Dynamic Programming than does the 
simple sequence optimization problem based solely on quantized learn- 
ing as a criterion, and would seem to constitute a more valid appli- 
cation of this technique. Applications of Dynamic Programming prin- 
ciples in contexts similar to that of the present investigation are 
described by Dear (1964), Matheson (1964), Karush S Dear (1966), and 
Calfee (1970). The Dynamic Programming aspects of these investiga- 
tions will not be discussed directly here, since the comments to fol- 
low regarding another report will generally apply to these as well. 

The formulation which is perhaps most relevant to the present 
discussion is that of Groen and Atkinson (I966). In addition, this 
reference appears to be the most widely accepted and oft-cited general 
application of Dynamic Programming principles to optimal instruction- 
sequencing problems. The example chosen by the authors to illustrate, 
solution by Dynamic Programming is that of a sequence of three presen- 
tations from G set of two instructional stimuli, using the single- 
operator linear model to specify state transitions. The "decision 
tree" used by the authors to illustrate the decision process is shown 
in Figure 3- The following comments regarding the application of 
Dynamic Programming to problems in optimal instruction-sequencing 
will stem largely from this example, but will not be restricted to 
it, since in all important respects, the general structure of Figure 
1 is embodied in this example. 

The first observation is that while the tree structure of Figure 
3 was apparently chosen to more lucidly illustrate the branching 
characteristics of the decision process, it is, nevertheless, not the 
customary framework for a Dynamic Programming formulation of the prob- 
lem. A Dynamic Programming formulation is ordinarily given in terms 
of a mapping of the state space of the process into itself at each 
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succeeding stage. In the customary fcrrnuiat i on, the mappings stem- 
ming from the particular initial state shown would be illustrated 
as in Figure 4. In a cursory sense^ of course, Figure 4 merely re- 
sults from a consolidation of common states at each stage in Figure 
3.^ In terms of practical considerations imposed by computer imple- 
frientdtion, however, the implications are more fa r- reach i . First 
of all, forward Dynamic Programming, which could have advatages over 
backward Dynamic Programming for applications of this type, is not 
directly suited to the problem as implemented literally as shown in 
Figure 3- The reason for this is that common states in the forward 
direction have been separated and are treated in memory no differently 
from states which are actually distinct. It would be necessary to 
effect a search through the whole list of successors at each stage in 
order to identify common states so that recursive optimization could 
be performed, a task which grows exponentially in magnitude with the 
number of stages. In the customary Dynamic Programming formulation, 
the computational task at each stage is independent of the number of 
stages, since the entire state space is represented systematically 
in memory at each stage. The size of the state space depends, of 
course, on the quantization accuracy of the components of the state 
vector, as well as on the number of compon^iints , in contrast to the 
size of the final stage of the tree representation, which depends 
on the number of stages and the number of components (it is assumed 
that the number of components in the state vector is the same as the 
number of instructional stimuli in the set, S) . Note that the quanti- 
zation accuracy, i.e., the accuracy with which the magnitudes of the 
components of the state vector are represented, does not affect memory 
requirements for the tree representation insofar as increased accuracy 
does not require extended precision arithmetic in the computer. 

For very small problems, the Iree representation is less restrict- 
ive in terms of computer memory requirements. In the example of Fig- 
ure 3, the last two stages impose a fast-memory requirement of 12 
•'nodes'' (the term ''node" is used as the measure of memory requirement, 
rather. than a decomposition into a more detailed specification in terms 
of the number of words or bytes required to contain the information re- 
presented by each node, since this is less obfuscative and provides a 
greater degree of generality). A standard Dynamic Programming formu- 
lation, on the other hand, would impose a fast-memory requirement of 
2x10 nodes, assuming a quantization interval of 0.01. These require- 
."^nts are determined from the fact that, in general, sufficient fast 
storage must be available to contain all information relating to any 
two successive stages in order to Implement a practically feasible 
Dynamic Programming solution. Since the amount of storage required 
is stage-dependent in the tree formulation, the limitation is obtained 
from the storage required to contain the last two stages, since these 
are the largest. Hence, the 12-node figure. The requirement is the 
same for any two successive stages in the standard formulation, and is 
numerically equal to twice the size of the state space, which in this 
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Figure k - Sequential Decision Process 
(Stundard Formulation) 
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case is (100) or 10 . 

Again, the requirement is independent of the number of stages for 
the standard formulation. If the example process were continued to 
21 stages (not seemingly unreasonable) the fa^t-memory requirement for 
the standa"d formulation would remain at 2x10 , while the rgquirement 
for the tree formulation v.'ould increase to more than 1.5x10 . In ad- 
dition, direct compdrisjns such as this must be tempered with the fact 
that the tree representation provides an optimal solution to a sequen- 
tial decision problem beginning at one particular initial state , while 
the standard fornulation provides the solutions to a f ami 1 y of sequen- 
tial decision problems beginning at any initial state in the state 
space. If optimal sequences are to be obtained for a number of dif- 
ferent initial states, these sequences would, in effect, be obtained 
with a single recursive optimization pass through the stages using 
the standard Dynamic Programming formulation, while a separate com- 
plete optimization would be required for each initial state using 
the tree formulation. 

The important point regarding fast-memory and computation-time 
limitations is that for any implementation involving straightforward 
Dynamic Programming techniques, the size of the problem which may be 
treated is severely limited. For example, for a state vector with 
only five components (five paired-associate items to be learned) and 
a quantization interval of 0,01, the fast-memory requirement would be 
2x10^^ nodes. Bearing in mind that the minimum conceivable associated 
byte requirement would be and that even the largest oresent- 

day computers have fast-memory sizes on the order of only 10^ or 10' 
bytes, this requirement is clearly prohibitive. For the tree formu- 
lation, five components would impose a fast-memory requirement of 
approximately 2x10 nodes after only 16 stages. Table 1 is a compi- 
lation of fast-memory requirements in terms of number of nodes for 
the standard formulation over a range of state vector size and quan- 
tization interval size, while Table 2 shows the fast-memory require- 
ments for the tree formulation over a range of state vector size and 
number of stages. The barriers imposed by fast-memory limits of pre- 
sent-day computers are illustrated by dashed lines in the tables, A 
straightforward Dynamic Programming solution would thus be implementa- 
ble only for values of the parameters corresponding to points in the 
upper portion of each table. 

Modified Dynamic Programming techniques, such as the State-incre- 
ment Dynamic Programming of Larson (I968) can, in some cases, effect 
reductions of fast-memory requirements by two or three orders of mag- 
nitude. It would seem, however, that even with this great a reduction, 
the requirements for most cases of practical interest would still be 
prohibitive in general. 

Dynamic Programming solutions, including those described above, as 
ordinarily implemented, are, in effect, breadth-first state-space search 
techniques, which impose a fundamental limitation in terms of memory re- 
quirements, as has been seen. It is possible, for the instruction 
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Table 1 - Fast-memory Requirements (in nodes) 
for Standard Formulation 
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Table 2 - Fast- memory Requirements (in. nodes) 
for Tree Formulation 
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sequence optimization problem under consideration, to use a form of 
depth-first search to partially alleviate this space restriction. The 
tree formulation Is more convenient for visualization of this method, 
although it is not requisite. To illustrate depth-first search, sup- 
pose that in the example of Figure 3, available fast memory was re- 
stricted to 3 nodes. This would prohibit application of standard 
Dynamic Programming, which would require 12-node storage. Instead 
of "searching" through the entire last stage, as would be done in the 
firiTt step of a conventional Dynamic Programming formulation, the search 
could be conducted one node at a time from Stage 3. For example, the 
r first node at the left of Stage 3 (.15, .90) could be stored, along with 

its successors, (.08, .90) and (.15, .^5). On the basis of this infor- 
mation, It could be concluded that the optimal decision from node 
(.15,-90) would be s«. The next step would be to attempt to "back up" 
to the parent node or (.15, .90) at Stage 2, which would be (.30, .90), 
and determine the optimal decision at that node. This cannot be done, 
however, until the optimal decision at node (.30,.A5) (second from left) 
at Stage 3 is known, so this is determined next. This process is con- 
tinued until the optimal decision at the initial node can be determined. 
For each successive step in the optimization, only three nodes at a time 
need be considered. In general, it is not necessary that the nodes con- 
tained in the portion of the tree being optimized be restricted to two 
successive stages. The only limitation is the size of the subtree which 
can be contained in memory at any given time. Even though this method 
may alleviate storage requ i remen t res t r i ct i ons in some cases, computa- 
tion time restrictions do not permit any significant practical extension 
of the size of the problem which may be solved by globally optimal tech- 
niques of this type. In the previously cited example Involving a state 
Vector with 5 components, a subtree of 10 stages would be the largest 
that could realistically be optimized at one t'^^^v """his means that the 
total optimization for 16 stages would Involve 5 ^^^^=5 - 1 .5x10^ sep- 
arate steps. If each step required one minute of computation time (a 
very conservatively low estimate), the entire optimization would take 
250 hours, and each succeeding stage would multiply this figure by a 
factor of 5. 

The final observation with regard to the application of Dynamic 
Programming to the solution of optimal instruction-sequencing problems 
Is that in most cases, at least for the paired-associate framework, it 
would appear to be unnecessary. The reason for this is that path costs, 
which are ordinarily Independent of the\ states of the process In the 
sense that they need not be a function of the states joined, are In 
fact a direct function of the states joined for processes of the type 
illustrated in Figure 1, of which the process of Figure 3 Is a special 
case. The result is that with the process of Figure 3, for example, 

once the state with the minimum value of cj^^^'^c)^^^ Is located, the prob- 
lem Is effectively solved, since reconstruction of any path from any 
such state back to the initial state constitutes optimization. Recur- 
sive optimization Is unnecessary. For general processes of the type 
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illustrated in Figure 1, all that is necessary to effect optimization 

n ^ 

is to search the final stage for the state, Q for which I q". is 

mj r=1 

minimum and reconstruct, any path of m segments back to the initial 
state. This path constitutes en optimal presentation strategy. It may 
be noted that since the models considered do not allow for stimulus in- 
teraction, the order of presentat ion . i s immaterial. All that is re- 
quired to contruct an optimal presentation strategy is a count of the 
number of times each element of the stimulus set is to be presented. 
It may be observed, for example, that the three optimal presentation 
strategies (s^s^s^? s^s^s^, and ^2525^) derived by Groen and Atkinson 

for the problem of Figure 3 are merely all possible distinguishable 
permutations of the set (s^,S2,S2). It is not even necessary that the 

fast-memory capacity be sufficient to contain the entire last stage. 
The sjcijch may be broken into as many segments as necessary, since the 
only information which must be transferred between segments is a single 



n 



(r) 



value of Z q\ corresponding to the current optimal state and a state 
r=1 

identifier. In theory, then, this simplification in optimization pro- 
cedure entirely eliminates the fast-memory constraint inherent in a 
Dynamic Programming formulation. Unfortunately, the computat i on- t i me 
constraint prevents any significant extension of the size of the prob- 
lem which may be treated. !n the previous 5~component example, for 
instance, the state space size was 2x10^^. This means that the final 
stage must be divided into at least 2x10^ segments, allowing for a 
fast-memory capacity of 'iO^ nodes. If each segment could be searched 
in 10 seconds (a conservatively low estimable) the entire search would 
take approximately 55 hours, and each addec' component would increase 
this figure by two orders of magnitude. 

The conclusion which apparently must be reached is that the practi- 
cal applicability of globally optimal search techniques to the problem 
at hand, even for the simplest cases, is inherently and rather severely 
limited by dimensionality constraints. 
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B. 



Algorithmic Optimization Methods 



Algorithmic methods of optimization for several special cases of 
the general optimal f nstuct ion-sequenci ng problem described in the 
Introduction have been proposed by various researchers. The term 
"algorithmic" is used here to indicate methods by which the optimal 
decision at each stage in succession may be determined directly from 
the present state of the process together with the learning model and 
possibly a response history. In other words, an algorithmic method 
is taken here to mean one by which each element of an optimal sequence 
may bfi determined directly as the sequence proceeds, as opposed to 
methods like Dynamic Programming, where the entire recursive optimi- 
zation must be performed before the sequence can be determined. 

The primary purpose of this section of the report will be to de- 
scribe an optimization algorithm which applies to a broad class of 
learning models which includes as special cases the single-operator 
linear model (Bush & Sternberg, 1959) > the one-element model (Bower, 
I96I; Estes, i960) and the random-trial increments (RTl) model (Nor- 
man, 1964). The linear and RTl models are described in Figures 5~1 
and 5"2, respectively, while the one-element model is described in 
Figure 6. The reason for discussing the algorithm in terms of these 
particular models is that they appear from the literature to be the 
most widely accepted and analyzed models of paired-associate learn- 
ing. In addition to a description of the algorithm and its applica- 
tion to paired-associate learning processes based on these three mod- 
els, some Monte Carlo simulation results will be discussed which com- 
pare the effectiveness of the optimal presentation strategy, as spec- 
ified by the algorithm, with standard cyclical or random strategies. 

Prior to the statement of the algorithm, a few definitions are in 
order. First, let the learning process to be optimized be represented 
by either Figure 1 (deterministic) or Figure 2 (non-deterministic), 
where the states Q.^, and the function to be maximized, E{T . ^} , are 

as defined by equations (l) and (2), respectively, in the Introduction. 
Next, let the gain function , AQ.jj^j be defined as the Increase in E{T} 

produced by a transition from the jth state of Stage i to the kth 
state of Stage i+1. Thus, 



If stimulus actions are treated as being independent by the learning 
model used in optimization, as is the case with the linear, one-element, 
and RTl models, then ^Q.jj|^ a function only of the component of the 

state vector cor respond i ng to the stimulus presented. Thus, if stim- 
ulus r is presented at Stage i, then AQ.j|^ is simply the difference 
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Trial Number, i 



Figure 5-1 Single Operator Linear Model 



^ fl , wi 
" 1o, wi 



, where 

th probabi 1 i ty c 
th probabi 1 i ty 1-c 



Trial Number, i 



Figure 5"^ Random- tri al Increments Model 
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Y - "guessing" probability 

9 - probability of transition from F to C 



Figure 6 One-element Model 
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between the value of che rth component before presentation of the 
corresponding st'mulus and the value after presentation: 



The class of learning models fo'' which the optimization algorithm 
to be specified applies is defiied as follows: 

1) The model must be applicable to paired-associate learn- 
ing as specified by the structure and associated descrip- 
tions of Figures 1 and 2. 

2) Stimulus interaction must be negligible, i.e., AQ... must 
be as specified by Equation (^) '-^ 

3) AQ. ., must be a non-negative monotonic non-decreasing 

1 J K 

function of i . 

Conditions 2) and 3) imply that L(l for each individual component must 
be non-negative monotonic non-decreasing, i.e., the learning model 
itself must have this property. 

At each stage, I , of a process described by Figure 1 or Figure 2, 
the process will be in a given state, Q.^.. Presentation of a given 

stimulus, s^, will cause a transition to a new state, Q..^^ j^, at Stage 

i+1. If the imbedded learning model satisfies the three conditions 
specified above, then an algorithm which specifies an optimal pre- 
sentation strategy for the process is as follows: 



A^: Choose for presentation at each stage, \, that stimulus 
which produces the largest value of (for determin- 

I JK 

istic models) or the largest value of E{AQ.. } (for non- 
deterministic models). '-^ 



This algorithm is, in effect, a more general version of the Largest 
Immediate Gain (LIG) algorithm of Calfee (1970) (A^ was arrived at 
independent of the work of Calfee. This fact would tend to support 
the validity of both algorithms). 

The reasoning behind the claim of optimal ity for algorithm A_ is 
as follows: since there is negligible stimulus interaction, the values 
of AQ produced by each stirr-'jlus, s , at any given point in the process, 
can be treated individually for that stimulus. Since these values of 
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AQ are also non-negative and monotonic non-decreasing with i, the 
value of AQ produced by a current presentation of s will be at 
least as great as that produced by any subsequent presentation of 
s^. In other words^ only current values of AQ for all s need be 
considered at each stage of the process, since each of tfiese values 
will be at least as large as subsequent values for the same stimulus. 
In more picturesque terms, the values of AQ for the process mav be 
viewed as blocks whose size is proportional to the magnitude of the 
corresponding AQ, with the blocks arranged in n stacks, correspond- 
ing to the n different stimuli. Presentation of a given stimulus 
corresponds to removing a block from the top of the corresponding 
stack. An optimal presentation strategy for m presentations (trials) 
corresponds to removing the m largest blocks from the tops of the 
stacks. Note that this analogy makes very clear the fact that the 
order in which the m largest blocks are removed is immaterial , subject 
only to the restriction that only rhe top block of any given stack 
may be removed at each step. The optimal ity of the procedure of re- 
moving at each step the block which is currently largest amony the 
top blocks is apparent from the observation that this block is the 
largest anywhere in any stack, since each top block is as large or 
larger than any other block in its stack. 

The block analogy applies directly only to deterministic models 
in the sense that the block sizes are fixed. The reasoning is essen- 
tially the same for non-deterministic models except that expected 
values of AQ must be used (these rr,ay or may not depend on Bayesian 
corrections, depending on whether or not the model is response- 
sensitive. Deterministic models are inherently response- i nsens i t ive) . 
Choosing at each step the stimulus for which E{AQ} is greatest pro- 
Qn the average , the largest value of E{T}. 

Note that algorithm A does not require that the same model para- 
meters, or even the same model, be used for all items in the paired- 
associ.3te list. In terms of the block analogy, the algorithm obvi- 
ously holds, regardless of the relative sizes of the stacks, as long 
as the blocks in each stack are arranged in decreasing order of size. 
Algorithm A is thus more general than certain other algorithms which 
have been proposed to apply only to cases where the same model with 
the same parameters is applied to each paired-associate item. 

As mentioned previously, algor i thm A_ wi 1 1 be applied to the lin- 
ear, one-element, and RTI models of paired-associate learning. Con- 
ditions 1) and 2) of this section are obviously satisfied by each of 
these three models. In order to show that A^ holds for each model, 
therefore, it is necessary only to show further that Condition 3) 
is satisfied. 

Calfee (1970) and Atkinson and Paulson (1972) have correctly ob- 
served that the optimal presentation strategy for the special case 
of the linear model with the same value of a and. the same initial 
values of probability of incorrect response (qQ^ ) for all items is 

the same as the standard cyclical strategy commonly employed in 
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paired-associate experiments. Since the order of arrangement of 
stimuli in the stimulus sequence Is immaterial, this strategy is 
equivalePc to presentation of each stimulus times, where is 

the largest integer less than or equal to m/n, followed by presen- 
tation of any k^ different stimuli, where 

k^ = m - k^n. 

Algorithm A, however, is more general than this algorithm in that 
the value of a and the initial value of q associated with each stim- 
ulus are arbitrary (within the unit interval). For equal values of 
a and q, of course, A^ produces the standard cyclical strategy, in 
effect . 

The fact that Condition 3) is satisfied for the linear model 
can be seen from the fact that q. satisfies Condition 3) > since 



and 0<a_< 1 . Hence , 

q, - q,^^ 
= q, - aq. 
= q.d-a) 

also satisfies Condition 3) since q. satisfies Condition 3) and 
0<^(1-a)<1. Therefore, A holds for ihe linear model. 

In order to determine the effectiveness of A under conditions 
other than uniform parameters, Monte Carlo simulations were conduct- 
ed for two different cases. In the first case, all initial values 
of q were set equal to the complement of the guessing probabilities, 
but the values of a for the individual items were chosen randomly 
according to a truncated Gaussian distribution. Mean values were 
chosen at 0.4, 0.6, 0.7, 0.8, 0.85, 0-90, 0,95 and 0.97 to permit 
coordination with Calfee's results {1970). The standard deviation 
was specified to be the difference between the mean value and the 
closest boundary of the unit interval, and the distribution was 
truncated at this boundary and at the symmetric point on the other 
side of the mean value. The optimal strategy specified by A^ for 
this case wi'l 1 not, in general, be the same as the standard cycli- 
cal strategy, so a simulation was conducted to compare the effective- 
ness of the two (details of the simulation are contained in the Ap- 
pencMx), using a simulated paired-associate list of 10 items. 

The results of the simulation are shown in Figures 7~1 through 
7-8. As can be seen from these figures, the maxirr'"" advantage 
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provided by A under any conditions is on the order of 3%- The 
curves shown represent average values for a sample size of 200. 
In other words, the simulated process was run 200 times, each time 
with a different set of ot values chosen as described above (Note: 
the fact that a value of 200 trials was chosen as the terminating 
value for the simulation is coincidental; the number of trials and 
the sample size are not necessarily related). 

The second case involving the linear model was a comparison of 
the strategy specified by for uniform parameters (the cyclical 
strategy) versus a uniform random strategy. Note that these two 
strategies are asymptotically equivalent for large values of m 
(in a sense, all strategies are equivalent for large m, since 
E{T} asymptotically approaches 100^ for any strategy when the 
learning model satisfies the three conditions for A^) , but for 
finite values of m, the random strategy, on the average, will pro- 
duce a lower value of E{T}. The results of this stimulation are 
shown in Figures 7"9 through 7""16. Again, the maximum advantage 
produced by A is on the order of 5^- Note that, for most values 
of a, the values of E{T} for the two strategies appear to converge 
faster than do the corresponding values for the two strategies in 
the first case. This is apparently due to the fact that the effects 
of values of ot distributed about some mean value are not, in general, 
symmetric. It might be noted that Figures 7"1 through 7-]6 apply, 
as well, to the RTI model used in response- i nsens i t i ve mode under 
the same conditions, with the only difference being that the ordi- 
nates represent average values of E{T}, since this model is nop"- 
deterministic. An equivalence between the parameters of the two 
models results from the fact that, for the RTI model in response- 
i nsens i t i ve mode : 



= q, (1-c+a^c) 

Hence, the same values of E{q.^^} are obtained for the RTI model 
with parameters a and c, as values of q... are obtained vor the 

r I "r I 

1 i near model wi th 

ot = 1 - c + ot c. 

r 

Karush and Dear (1966) obtained an optimal presentation strate- 
gy for the special case of the one-element model with uniform values 
^f 6 and y for all items, as well as equal initial values of q. The 
strategy is to choose at each stage that item for which the current 
probability of being in the learned state is least. This algorithm 
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was later abstracted (cf. Atkinson, I968; Calfee, 1970; Atkinson and 
Paulson, 1972) to a form based on counts of responses, where the item 
chosen for presentation is the one with' the fewest consecutive cor- 
rect responses since the last incorrect response. This algorithm is 
a special case of algorithm A, since choosing the item for which AQ. 
is largest under conditions of uniform parameters is equivalent to 
choosing the item with the fewest consecutive correct responses. 
This can be shown as follows. Karush and Dear (I966) have shown 
that A., the probability of being conditioned to a given item at the 

outset of trial I, is a monotonic non-decreasing function of i, given 
correct responses at each trial. Following any incorrect response, 
X is reset to 6, the probability of transition to the conditioned 
state, given that the model was in the unconditioned scate prior to 
reinforcement. _Fol lowing the notation of Karush and Dear (E for cor- 
rect response, E for incorrect response, C denoting the conditioned 
state, C denoting the unconditioned state, and p-=:1-p for any proba- 
bility, p) , then 



E{A,^.} = P{E}P{C.^Je} + P{E}P{C.^Je} 
1 + 1 1 + 1 ' i + i ' 

= (A.+yAv){ (A.+YeAv)/(A.+YA':^)} + v-'^Ave 
III III I 

= A. + Ave 
I I 

and 

E{A'^_^-} = 1 - A. - A*e 
1 + 1 I I 

= A*e^^ 
I 

Hence, 

AQ = q. - qj^i 

and 



E{AQ} = y-iX-: 



Xi'Q*) 
I 



I 



(5) 
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which is monotonic nop- i ncreas i ng , since A. is monotonic non- 

I 

decreasing. Therefore, choosing at each step the item for which 
E{AQ} is largest is equivalent to choosing the item, under condi- 
tions of uniform model parameters, for which the number of consecu- 
tive correct responses is least. Equation (5) also shows that the 
one-element model satisfies Condition 3) for A^, quaranteeing that 
holds for this model with arbitrary parameters. 

Monte Carlo simulations were conducted for the one-element model 
for three cases. In all three cases, separate simulations were run 
for values of 6 (or 6-mean) equal to 0.025, 0.05, 0.1, and 0.3, again, 
to allow for coordination with Calfee's data (1970). In the first two 
cases, the strategy specified by A was compared with the standard cyc- 
lical strategy for fixed uniform values of 6, and for randomly deter- 
r;ined values of 6 (by the method specified for the linear model), re- 
spectively. The results of these simulations are shown in Figures 
8-1, 8-3, 8-5, and 8-7 for the uniform values, and in Figures 8-2, 
8-^, 8-6, and 8-8 for the random values. The curves show, first of 
all, that greater advantages are predicted for the optimal strategy 
by the one-element model than by the linear model. Secondly, the 
simulations show that greater gains can be expected, on the average, 
for the one-element model with non-uniform parameters than for the 
one-element model with uniform parameters. 

The third case involving the one-element model was a comparison 
of the optimal strategy given by A^wlth a uniform random strategy. 
Again, the results show greater predicted gains using the one-element 
model than using the linear model. This is not surprising, certainly, 
since it would be expected that an optimal response-sensitive strategy 
should be able to use the additional information supplied by the re- 
sponses to advantage. 

Dear, et al. (1967), have reported an experiment designed to test 
the performance of an optimal presentation strategy for the one-element 
model with uniform parameters (Karush S Dear, 1966). The strategy to 
be tested was the special case of A^ just discussed. The experiment 
was designed to compare this strategy with the standard cyclical strat- 
egy for effectiveness. The primary result of the experiment was that 
no statistically significant difference between the two strategies was 
detectable in terms of post-test scores. As there was no evidence 
that a full simulation of the instructional process had been conducted 
to determine theoretically predicted differences between the two strat- 
egies, and since such a simulation involved only minor modifications 
to the programs used 5n the one-element simulations represented by 
Figures 8-1 through 8-12, it was decided to Include this simulation 
in this investigation. The program used to conduct the simulation is 
listed in the Appendix, and serves to illustrate the techniques used 
in the one-element simulations as well. 

Figure 9"1 illustrates the expected sum-of-correct-responses 
scores versus number of trials for the two-choice response experiment, 
using the value of 6 assumed By Dear, et al., while Figure 10-1 
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illustrates the corresponding perfect-performance scores. Figures 
9"3 and 10-3 illustrate the corresponding information for the four- 
choice response experiment. Figures S"! and 9"3 Mlustrate the fact 
that, for both experiments, higher scopes are predicted for both 
strategies than were actually observed, in addition to the fact that 
larger differences between the strategies were predicted than were 
observed, although the predicted differences are not as great as 
might have been expected. It may also be noted that the number of 
trials chosen by Dear, et al . (320) appears to be past the point of 
predicted maximum difference between the two strategies. This means 
that a greater difference might have been observed in the experiment 
if a smaller number of trials (between 200 and 2^0) had been chosen, 
although it is unlikely this would have been the case, since the 
maximum difference in either case is not a great deal larger than 
the difference at 320 trials. Figures 10-1 and 10-3, illustrating 
the mean perfect-performance scores for the two experiments, also 
shows that a different choice for the number of trials probably 
would not have produced a significantly different result, in terms 
of the difference between the two strategies. 

Figures 9-2, 10-2, and 10-A show the results of the same 

simuiations as were run for 9"1j 9"3* 10-1, and 10-3, respectively, 
except for the values of 9 (0.05 for the two-choice response experi- 
ment and 0.0^^ for the four-choice response experiment). These values 
were chosen so that the expected sum-of-correct- responses scores for 
the standard strategy would be approximately the same as those ob- 
tained experimentally. The reason for doing this was to observe the 
corresponding predicted results for the optimal strategy for these 
values of 6. Figures 9-2, 9"^, 10-2, and 10-A show that the predicted 
differences for the two strategies are the same or greater than those 
corresponding to a 9 of 0.1. It wili be noted, however, that the max- 
imum predicted differences occur in this case for values of m greater 
than 320. The basic conclusion drawn by Dear, et al., that the one- 
element model is shown to be inadequate for accurately representing 
the learning process involved, nevertheless appears to be justified 
on the basis of the simulation data. 

The final learning model to be discussed in the context of opti- 
mal instruction-sequencing is the Random-trial Increments (RTl) model 
(Matheson, 1964). It is well known that, in a mathematical sense, the 
linear and one-element models are merely special cases of the RTl mod- 
el corresponding to certain choices of parameters (c=1.0 and aj^=0.0, 

respectively). As has already been mentioned, when the RTl model is 
used in response-insensitive mode, the values of E{T} produced by a 
given strategy, on the average, correspond to those produced by the 
same strategy using a linear model with 



a = ap + 1 - c (6) 
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It appears that the RTI model might also be used in a response- 
sensitive mode by employing some statistical means of differentiating 
expected values of q following a correct response fro/n those follow- 
ing an incorrect resf.'inse, such as the Bayesian estimator used by 
Karush S Dear (I966) . In order to obtain such an estimator for the 
RTI model, one might first consider the distribution of q values for 
sequential trials. Following Atkinson, et al. (1965, pp- 123"128) , 
the joint probability of incorrect responses on trials i and i+1 is: 



is the r-th moment of the distribution of q values on trial i. Now, 



P{E.SE. J = Zcaq^ .P{S. =v} + E(l-c)q^ .P{S. = 
I 1 + 1 V ^V>l 1-1 V ^v,i 1-1 



= (a^c + 1 - c)V 



2,1 



where 



V . = Zq^' .P{S. =v} 
r, I V V, I 1-1 



(aj^c + 1 - cj q^ 



P{E.^JE.} = P{E,SE,^^}/P{E.} 



= (aj^c+1-c)q^e 



i-1 



(7) 



where 
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Correspondingly, the joint probability of a correct response on trial 
i and an incorrect response on trial i+1 is: 



PiE.^E.,,} = Ecaq^^,(l-q^^,)P{S,_^=v} + E (l -c) ^ , (l -q^ ^ , ) P {S , _^=v } 
= (a,c-M-c)(V^^,-V2^,) 

Hence, 

Using Equations (7) and (8) as the basis for providing updated esti- 
mates of q values, given the previous response, a simulation was con- 
ducted to compare the results of applying algorithm A for three dif- 
ferent sets of RTI parameters. All values were chosen so that a, as 
given by Equation (6), corresponded to one of the eight values used 
in the previous simulations involving the linear model. Each set of 
OL^ and c values consisted of three pairs, two of which corresponded 

to the special cases of the linear and one-element models (c=:1. and 
a|^=0., respectively), with the chird pair corresponding to an inter- 
mediate RTI model. The results of the simulation are shown in Figures 
11-1 through 11-8, corresponding to the eight different cx values. 
As would be expected, the E{T} values obtained for the linear cases 
(lower curves) were almost identical ( subject only to minor statis- 
tical- variat ion) to the values previously obtained for the actual 
linear model simulation. That the RTI model with c=1 . and the linear 
model are equivalent in this sense can be seen from the fact that both 

P{£.^^|E.} and P{E.^Je.} reduce to cx'q^ ^^i + 1 linear model) 

meaning ti)at the model is response- i nsens i t ive. The performance of jA 
is seen to improve for values of ol^ and c giving the same equivalent 

value of a, but corresponding to RTI models lying in between the 
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extremes of the linear and the one-element models (middle curves). 

The best performance is obtained for equivalent values of and c 

K 

corresponding to the one-element mode! (upper curves). The results 
seem to indicate that the RTI model, using the estimators of Eq-j^cions 
(7) and (8), becomes "more response-sensitive", in a sense, as the 
parameter values range from those corresponding to the linear model, 
on the response- i nsens i t ive extreme, to those corresponding to the 
one-element model on the response- i nsens ? t i ve extreme. Since each 
pair of otj^ and c values for a given Figure (11-1 through 11-8) yields 

the same a value, the lower curve represents the results of applying 
the standard response-insensitive cyclical strategy for any of the 
three models. 
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C. Heuristic Search Techniques 



It appears that optimal instruction-sequencing problems employing 
present rDodels of learning may be fairly easily handled by the methods 
of Section lll-B. Since exhaustive-search techniques such as Dynamic 
Pre ramming appear "o be impractical, due to dimensionality constraints, 
the question arises as to approaches to the problem for models which 
cannot be handled al gori thmi cal ly. In this regard, certain of the 
heuristic techniques developed in the field of Artificial Intelligence 
would appear to be applicable to the optimal instruction-sequencing 
problem. These techniques were developed specifically to provide an 
approach to search problems much too large to be handled by conven- 
tional search techniques. The particular heuristic method to be dis- 
cussed with regard to possible application to the optimal instruction- 
sequencing problem is basically the Ordered-search Al gor i thm of Nils- 
son(l971, P»59)» The search process will be described in the context 
of a state-space graph simi lar to that of Figure k of Section !I|-A. 
The process involves the concept of searching for a "goal node" in 
the graph, which in the present context cculd be taken either as any 
node at a depth m (m trials removed from the initial state, or "start 
node"), or as any node corresponding to a level of learning or condi- 
tioning at or above a certain criterion. If the search is conducted 
subject to the latter specification of "goal node", then "incremental 
path length" (the "cost", in a sense, of the path connecting two nodes) 
would be defined as 1, and the search for a minimum-length path to a 
goal node would correspond with the determination of the shortest se- 
quence of stimuli necessary to achieve a specified level of learning. 
If a goal node is defined as any node at a depth corresponding to a 
specified number of trials, m, then incremental path length would be 
defined as some form of complement (such as n-AQ) of the incremental 
amount of learning which takes place between two nodes, and the search 
for a minimum-length path to a goal node in this case would correspond 
with the determination of the sequence of m stimul; which produces the 
highest level of learning. In either of these cases, the heuristic 
ordered-search algorithm illustrated in Figure 12 can be applied to 
determine a minimum-length path to a goal node. 

The algorithm is self-explanatory excep*: for a few comments re- 

garding the heuristic function, f. Th.is function is a heuristical ly 
determined estimate of the length of a minimum-length path from the 
start node to a goal node constrained to pass through the node to 

which the f value to be calculated applies. The value of f at any 
node, n, is determined from 



f(n) = g(n) + h(n)^ (9) 
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Put the start node s on a list 
called OPEN and compute f(s) 

If OPEN is empty, exit with 
failure; otherwise continue 

Remove from OPEN that node whose 
f value is smallest and put it 
on a list called CLOSED. Call 
this node n. (Resolve ties for 
minimal f values arbitrarily, 
but always in favor of any goal 
node) 

ir n is a goal node, exit with 
the solution path obtained by 
tracing back through the 
pointers; otherwise continue 

Expand node n, generating all of 
its successors, (if there are no 
successors, go immediately to 2) 
For each successor n., compute 
f(n,) 

Associate with the successors 
not already on either OPEN or 
CLOSED the f values just com- 
puted. Put these nodes on 
OPEN and direct pointers from 
them back to n 

Associate with those succes- 
sors that were already on 
OPEN or CLOSED the smaller of 
the f values just computed 
and their previous f values. 
Put on^OPEN those successors 
whose f values were thus low- 
ered, and redirect to n the 
pointers from all nodes whose 
f values were lowered 



START ) 



Put s on^OPEN 
Compute f (s) 




Yes 



Fai lure 



Remove from OPEN the 

A 

node (n) with smallest f 
value and put on CLOSED 




Ye= 



Success 



Expand n, i ncludi ng 
f values 






Put succe5=sors not pre- 
viously generated on OPEN 


> 





Associate smaller f value 
with previously generated 
successors with appropriate 
pointers. Put on OPEN 
thos*^ successors on CLOSED 
whose f values were lowered 



Go to 2 



Figure 12 Ordered-search Algorithm 
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where g(n) is an estimate of the length of a minimum-length path 

from the start node to node n, and h(n} is an estimate of the length 
of a minimum-length path from node n to a goal node. More formally, 
if k(n.,n.) is the length of a minimum-length path between any two 

arbitrary nodes, n. and n. (it is possible, in general, for more than 

one path to exist between two nodes. There may also be no path be- 
tween two nodes, in which case k is undefined), and if T is a set of 
goal nodes, then 

h (n . ) = mi n k (n. , n . ) 

n.eT 
J 

is the actual length of a minimum-length path between any node, n., 
and a goal node. The function h(n.) is an estimate of this length. 
Simi larly, 

g(n.) = k(s,n.) 

is the actual length of a minimum-length path between the start node, 
s, and any node, n,, accessible from s. The function g(n.) is an 
estimate of this length. More will be said presently about the prac- 
tical determination of g and h. 

The expansion of nodes referred to in the algorithm of Figure 12 
merely amounts to determining all possible successors to a node one 
step away from that node. In the context of the state-space formula- 
tion for the optimal instruction-sequencing problem, this amounts to 
determining all possible state vectors which can result in one step 
from the present state vector as a function of which stimuli are 
given. In order to avoid the problem of inordinately large state 
spaces encountered in Dynamic Programming applications, the generated 
successor nodes are stored in the form of a list, as indicated in the 
algorithm, rather than requiring storage of the entire state space. 
This means, of course, that the entire lib: must be f>earched for pre- 
viously generated successors, as would be done in the tree formula- 
tion for a Dynamic Programming solution, but the portion of the entire 
graph which is searched when effective heuristics ^re employed is 
extremely small, so that computation time is not nearly as great a 
limitation as in a Dynamic Programming formulation. 

In the progression of the algorithm, the fact that a value of f 
is to be determined for a given node means that at least one path 
has been determined between that novJe and the start node. Thus, 
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g(n) can be set equal to the length of the minumum-length path thus 
far determined between the start node and node n, with the guarantee 
that g(n)>jg(n). It can be shown (cf. Nilsson, 1971, pp-59-65) that if 

h(n) < h(n) (10) 

and i f 

A ^ 

h (m) - h (n) 4 l<(m,n) 

for any two nodes, m and n, and if there exists a mi nimum- length path 
between the start node and a goal node, the algorithm specified In 
Figure 12 will find this path. The second inequality is not actually 
necessary to guaratee that an optimal path will be found, but does 
guarantee that once the algorithm expands a node, an optimal path to 
that node has been found. This inequality is called the consistency 
assumption (the assumption that the inequality is satisfied) and is.s 
usually satisfied if the heuristic information used in determining h 
is applied consistently at all nodes. 

The effectiveness of the search algorithm, in terms of computational 
power expended to find a goal node (a solution), dependr critically on 

the heuristic function, h. In effect, the requirement ' it (1C) be 
satisfied guaratees that the solution is globally optimal. Selection 

of heuristics which give the largest value of h subject to 10 yields 

the solution with a minimum of computational effort (setting h=0 cor- 
responds to a complete absence of heuristic information and results 
in inefficient blind search). Relaxing the Inequality constraint on 

h may yield a solution requiring even less computational effort, but 
forfeits the guarantee of global optimal Ity. Nevertheless, such a 
solution could be t^sefuK 

As an example of the calculation of h for an optir'J Instruction- 
sequencing problem, consider the appl ?v':at ion of the ordered-search 
algorithm to a problem with the structure of Figure 1, using a model 
of learni ng which includes general stimulus interaction subject only 
to the restriction that application of any given stimulus is non- 
reinforcing to all but the corresponding component of the state vector. 
In other words, the effect of application of the r-th stimulus on any 
but the r-th q value would be to either increase it or leave it un- 
changed. Consider also that the search for a minimum-length path is 
taken in the context of a search for the shortest sequence of stimuli 
which will produce a given level of learning* Under these conditions, 

a possible heuristic for the determination of h would be to estimate 
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the length of the mi n Imum- 1 ength path without taking stimulus inter- 
action into account. Due to the restriction on the interaction, 

h would certainly then constitute a lower bound on h, thus guaran- 
teeing an optimal solution. If the properties of the learning model 
under these conditions were even further constrained, such as if q. 

were a monotonic function, or were known to vary at less than some 

given rate, then the determination of h could be quite straightfor- 
ward. The description in general terms of the calculation of a heur- 
istic function is quite difficult, at best, since its determination 
is intrinsically related to the detailed structure of the particular 
problem to be solved. At the same time, the determination of power- 
ful heuristics is the crucial point in the effectiveness of heuristic 
search techniques. It must be left to further research into specific 
instruction-sequencing problems to determine how effective the heur- 
istic paradigm may be in the solution of these problems. 
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Chapter IV - 



Conclusions 



Of the three optimization techniques (exhaustive search, algo- 
rithmic, and heuristic search) investigated for the problem structure 
outlined in the Introduction, algorithmic methods seem the best suited 
for use with presently accepted models of paired-associate learning 
(specifically, t.he single-operator linear model, the one-element model, 
and the random-trial increments model). Dynamic Programming as a solu- 
tion technique for problems involving these models is not only unneces- 
sarily complex, but is fundamentally and severely limited in its appli- 
cability due to constraints of dimensionality, primarily in terms of 
computer fast-storage size requirements. Modified exhaustive tech- 
niques, such as State- inurement Dynamic Programming, depth-first search, 
and last-stage search, m^y be used to alleviate memory-size constraints, 
but computation-time constraints still severely limit the size of prob- 
lem which can be treated. Algorithmic methods of optimization, i.e., 
methods by which the optimal decision at each step may be specified 
immediately without the need for look-ahead search, appear to be suf- 
ficient for the optimization problem involving models with no stimulus 
interaction or time dependence. The optimal algorithm speciffed in 
Section lll-B of the report, algorithm A, has been shown to be quite 
general in that the class of models to which it applies includes the 
linear, one-element, and RTI models as special cases. In '^iddltlon, 
it is seen that the standard cycl.cal strategy, which is optimal for 
the linear model with uniform parameters, the optimal strategy of 
Karush & Dear (1966) for the one-element model with uniform parameters, 
and the Largest Immediate Gain strategy of Calfee (1970) are all special 
cases of algorithm A^. An interesting by-product of the algorithm is the 
way in which it makes clear the fact that, for models without stimulus 
interaction, the actual order of presentation of stimuli is immaterial , 
i.e., a specification of the number of times each stimulus is to be 
presented is sufficient to construct an optimal sequence, and all such 
sequences are equivalent. 

Monte Carlo simulations were conducted to determine expected test 
scores versus number of trials using the strategy specified by algorithm 
A in comparison with the standard cyclical strategy and a uniform random 
strategy for the three models used over a range of parameter values. 
For the linear model with uniform parameters, of course, the cyclical 
strategy _[s^ the optimal strategy, and a random variation tn parameters 
was therefore introduced to separate the strategies. While the optimal 
strategy in this case did provide an advantage over the cyclical strat- 
egy, the advantage was fairly small, the maximum advantage being on the 
order of 5"10^ in expected test score for a given number of trials. 
Corresponding advantages for th one-element model were much larger, 
the maximum advantage being on order of 20-25% in test score, with 
advantages for intermediate resp^^nse-sens i t i ve RTI models lying inbe- 
tween. The conclusion to be drawn seems to be that optimal ins^truction 
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sequencing strategies can be used to best advantage by response-sensi- 
tive models which make the most effective use of the additional statis- 
tical information contained in the response history of a subjecc. This 
seems only logical, but perhaps the most important point brought out by 
the simulations is the magnitude of the difference involved. To take a 
specific example, the uniform-parameter one~element model in response- 
insensitive mode is stat i r.t" i cal 1 y equivalent to a uniform-parameter 
linear model with a=1-0. The cyclical strategy applied to this one- 
element model is therefore the optimal response - i n sens i t i ve strategy, 
and the simulations show that the optimal response- sens i t i ve strategy 
applied to the same model can provide «.i maximum advantage of 20-25^ in 
test score for a given number of trials, or 30-40^ in the number of 
trials necessary to achieve a given score. These results would tend 
to indicate that the pursuit of optimal instruction-sequencing methods, 
including the models involved, which make effective use of the response 
history of a Subject could be very worthwhile. In contrast with this 
view, the results of the experiment of Dear, et al . (1967) > using the 
one-element model, would seem to indicate that response-sensitive opti- 
mization strategies are fairly ineffective in practice. i.. all like- 
lihood, however, this is due primarily to the inadequacy of the one- 
element model, per se, to represent the learning process involved. 
The simulations of this experiment included in this i nvest igrjt Ion tend 
to support this conclusion. It remains to be seen whether or not more 
accurate models will result in effective optimal s;trategies in practice 
It is reasonable to expect that more complete models of the paired- 
associate learning process will be sufficiently complex to preclude 
the use of algorithmic methods such as algorithm A. A heuristic-search 
technique which has the capability of overcoming the dimensionality 1 im 
itations of exhaustive-search techniques has been presented as a possi- 
ble alternative approach to the optimization problem in such cases. 
Although its specification must necessarily be fairly general at pre- 
sent, particularly with regard to the heuristics involved, it is never- 
theless quite likely that this approach could prove viable through 
future research when appropriate learning models are developed. 
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Chapter V - Recommendations 



First of all, it is probable that further research into the appli- 
cation of exhaustive-search techniques, such as Dynamic Programming, 
to the optimal instruction-sequencing problem will prove relatively 
fruitlessMn practice, primarily due to severe inherent limitations 
of dimensionality. Algorithmic methods appear adequate for present 
learning models, but the models themselves appear to be inadequate 
from the standpoint of accurate representation of the learning pro- 
cess. Nevertheless, further research into algorithmic methods, such 
as algori thm A presented in Section lll-B of this repor^, may be worth- 
while, particularly from the standpoint of extending their applicability 
to broader classes of learning models. As more complete learning models 
are developed, it is likely that their complexity, particularly with re- 
gard to interrelationships^among the factors in the learning process, 
will be sufficiently great to preclude a simple algorithmic approach 
to optimization. Heuristic methods similar to that presented in this 
report may then be the only viable approach to the problem, at least 
of the three approaches considered in this i nves t i gat i on ♦ Further 
research into such heuristic methods, therefore, is definitely recom- 
mended . 

Ultimately, of cour::e, one of the most important practical appli- 
cations of optimal instruction-sequencing could be automatic interactive 
optimization by computer in a CAl environment. Algorithmic optimization 
methods are normally very easily implemented in such a s i tuat ion , wh i ch 
provides another good reason to pursue the investigation of such methods 
to the limits of their applicability. Even for more complex optimiza- 
tion tasks, heuristic methods give promise of being able to provide 
effective instruction-sequencing in interactive real-time CAl situa- 
tions. S'ch methods should be pursued further in this regard, as well. 
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Appendix - 



Simulation Details 



DIMFMSION 0(2, 10), A (10), SCI (10), SC2( 10) 
REAL AC (7)/. 6, .7,. 8,. 8 5,. 9,. 95,. 9 7/ 

IX = 7P)5t+321 
nn 3 Kl=l,7 
AM=Ar (Kl) 

v/Ri Tf-:(3,2n)AM,sn 

S1 = 0. 

S2=n. 

no 5 j=i,2no 
no 1 1 1=1, 10 
nn 1 12=1,2 
1 n(i2,n)=n.9 

DO 6 K2=l,10 
^? AR=0. 

DO, 9 1=1,12 

CALL RAr'HlK 1 y, I Y, YFL) 

I X = l Y ■ 
9 AB=AR+YFL 

V=(AR-6. 0)*Sn^ AM 

iF((V.GT.l. ).nR. (V.LT. (2.*AM-1. )))Gn TO 
fi A(K2)=V 
! 0 = 0 

no 1+ K = l,200 
IC=IC+1 

IF(IC.GT.10)ir=l 
INn=! c 

0(1, I NO) =0(1, |Mn)*A(!ND) 

nN'AX=Q(2, 1)-Q(2, 1)*A(1) 

IMAX=1 

DO 7 1=2,10 

DQ=0.(2, I )-a(2, I )*A(I ) 

I F(no. LE.nMAX)Gn to 7 
nMAX=Dn 

IMAX=I 
7 CONTINUE 

0(2, I MAX) =0(2, 1MAX)*A( IMAX) 
I N=K/20 

i F( I N*20. NE. K)GO TO k 



Figure A- 1 Simulation of Optimization for Linear 

Model with Normally Distributed a-va;ues 



8k 



DO 5 L=l,10 

SCK I N)=SC1( 1 N)+n(l, L) 

SC2(I N)=SC2(I N)+0(2, L) 
k CONTINUE 
5 COMTIM(JE 

DO 2 1=1,10 

Ti=ino.-«;ri( I )/20. 
T2=ino. -Sr2(l )/20. 

2 WRI TF.(3, 21)^\T1,T2 

3 COMTIMDF 

2 0 FnRMAT(//5X,3HAM = , FI+ . 2 , 5 X, 3HSD = , F7.5/) 
21 F0RMAT(5X, I3,2(5X,F5.1)) 
STOP 



Figure A-1 (continued) 




ERIC 



niMFNSinM ni(i.n),n2(in),GF( 10), sc(io),FCio) 

IX=5173^ 

REAL APT( 7)/. Ill, . 11+3, . 2, . 2 5, . 3 33, . 5, . 62 5/ 
RFAl CT(7)/.l+5,.35,.25,.2,.15,.l,.08/ 
DO 10 Kl=l,7 
AP=APT(K1) 
C=CT(K1) 
A=AR*C+1.-C 
fi = (AR*AR*n+l.-C)/A 
V1?=0. 9*A 
WRI TR(3, 21)AR,C,A 
no 2 I =1, 10 
2 SC(l)=0.n 

nn 12 j=i,ioon 
no 1 1=1,10 

RF( I )=V12 
F(I)=V12 
QK I )=0. 9 
1 02(l)=n.n 
no ii K=l,200 
nMAX=Q2(l) 
IMAX=1 

nn R I =2, 10 

if^(02(i ) . i.F.nMAX)nn to p 

nMAX=Q2( I ) 
I MAX = I 
6 CONTINUE 

PALI. RANnU( IX, IV, YFL) 
I X=l Y 

I F(YFL:nT.01( IMAX) )Gn TO 8 
02( IMAX)=GF( IMAX) 
GO TO 9 

8 Q2( I MAX ) = (A-GF( IMAX))* (F( IMAX )/( l.-F( IMAX) )) 

9 GF( I MAX)=GF( IMaX)*G 
F( I MAX)=F( IMAX)»A 
CALL RANnU( I X, I Y, YFL) 
IX=I Y 

I F(YFL.GT. C)GO TO 13 

I F(nl( IMAX) .GT.lE-20)ni( IMAX)=01( IMAX)*AR 
13 IND=K/20 

IF( I MD*20.NE.K)GO TO 1+ 
nn 5 K2=l,10 

5 sc(iNn)=sc(iNn)+ai(K2) 



Figure A-2 Simulation of Optimization for RTI Model 
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k CONTINUE 
12 CONTINUE 

11 nn 10 1=1, in 
M=2r)*i 

Ts=ino.-sc(i )/ioo. 

in -VPI TE(3,20)M,TS 

20 f-0RMAT(5X, 5X,F5. 1) 

21 P1RMAT(//.SX,3HAP = ,F5.3,3X,2HC = ,F5.3,3X,2HA = ,F5.3/) 



Figure A-2 (continued) 
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niMFMSION 0(17,2), I MP (2), SCI (10), SC2( in) 
PIflFNSinM PlS(l,0),P2<^(in^,P(17,2) 
IX = 7F5tt321 

REAL G(lt)/ . 5, . 25, . 5, . 25/ 

REAL T(tt)/.i,.i,.n5,.nuu/ 
nn 2 Ki=i,u 

GA=n(Kl) 
TH=T(K1) 

WRI TE(3,21)GA,TH 
IMD(1)=0 
IQS=17 
K'SA\/ = 17 
SAVn=l. n 
DO 12 1=1,10 
P1S( I ) = 0. 
P2S( I )=0. 
SCK I )=0. 
12 3C2(I)=0. 

nn 5 j=i,inno 
no 1 11=1,16 

DO 1 12=1,2 
P( I 1, I 2)=0. 

1 n(ii,i2)=o. 

DO k K = l,kn() 

I ND( l)=l MD( 1)+1 

IF( IMD(I).GT.lfi) iriD(l) = l 

in aMiM=n(i\, 2) 

10 = 1 

nn n M=2, 16 

lF(n(N,2).GF.nMiM)Gn in r 
om; N=a(N,2 ) 

IQ = N 
6 CnNTINUE 

IFdn.NE. I OS) GO TO 9 
SAVa=OMI M 
NSAV=IQ 
0(10,2)=!. 01 

GO in in 

9 a(rJSAV,2)=SAVa 

iQs=ia 

NSA\/=17 

SAVn=1.0 

IND(2)=ia 

CALL RANDII( IX, lY, YFL) 
IX=!Y 



Figure A-3 Simulation of Experiment of Dear, et al . (1967) 
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IF(YFL. LT.TH)P(IND(1),1)=1. 
ID=I Mr/(2 ) 

IF(P(in,2).EQ.l. )Gn TO 8 
CALL RANlJu( IX, lY, YFL) 
IX=IY 

IF( YFL. LT.GA)nn TO 8 
n( I n,2)=TH 
00 Tn 7 

8 o.T=ri(!n,2) 

X=nA*(l.~OT) 

a( I n,2)=(o,T+x*TH)/(aT+x) 

7 CALL RA^'^li( 1 X, ! Y, YFL) 
IX=I Y 

IF(YFL.LT.TH)P(ID,2)=1. 

IN=K/4n 

IF( I N*iin.NE.K)nn to k 
Pl=l. 

P2 =1 . 

no li L=i,i6 

Pl=Pl*(nA+P(L,l)*(l.-GA) ) 
P2=P2*(fiA+P(L,2)*(l.-GA) ) 
SCK IN)=SC1( IM)+(1.-P(L,1))*(1.-GA) 
11 SC2(!N)=SC2(IM)+(1.-P(L,2))*(1.-GA) 
P1S( I N)=P1S( I N)+P1 
P2S( I N)=P2S( I N)+P2 
k CONTINUE 
5 CONTINUE 
DO 2 1=1,10 
M = l|0*l 

TS1=16.-SC1( I )/1000. 
TS2=16.-SC2( I )/inOO. 

psi=pis( I )/innn. 
PS2=P2S( I )/iono. 

2 WRtTE(3,2n)M,TSl,T.S2,PSl,PS2 

20 FORMAT(5X,I5,l|(';X,F8.5)) 

21 F0RMAT(//5X,6HGAMMA = ,FI|. 3,2X,6HTHETA = ,Fl|.y/ ) 
STOP 

END 



Figure A-3 (continued) 



ERIC 
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SLIM1 = 0. 

DO R .J = 1,K3 

si = n. 
S2=n. 

DO 1 1=^1, 10 

Old )=i. 
1 n2(f)=i. 

no k K=1,M 

CALL RANniJ( IX, IY,YFL) 
IX=I Y 

IND=YFl*in.+l 

CALL RANniJ( IX, I Y,YFL) 

IX=IY 

IF(YFL. LT.OQI ( I Nn)=ai( I MD)*ALPH 

nMAX=a2( 1) 

IMAX=1 

DO 6 1=2,10 

I F(n2( 1 ) . LE.aMAX)GO TO 6 
aMAX=a2( I ) 
I MAX = I 
6 CONTINUE 

CALL RANDU( IX, IY,YFL) 
iX=lY 

k IF(YFL. LT.C)n2( IMAX)=n2( IMAX)*ALPH 

Pn 5 L=i,in 

S1=S1+01(L) 
5 S2=S2+n9(i.) 

SIIMl = StlMl + Sl 
SIJM2=SUf^2 + S2 

s( j,i)=inn.-si*io. 

S(>;, 2) = 100.-S2*10. 

8 CONTINUE 

SMi=inn. -suMi/Ki* 

SM2 = 10n.-SUM2/Kl* 
X1 = 0. 

X2=n. 

no 9 1=1, K3 

Xl = Xl+(SM-<;( I , 1) )**2 

9 X2=X2+(SM-S( I , 2 ) )**2 

S Gl=SnRT(Xl/(FK3*(FK3-l) )) 
S! . ?=*;0PT(X2/(FI<3*(FK3-1) )) 

psi=*;!Gi*ino/<;Mi 

PS2=SI G2*100/SM2 
WRITE(3, 25)SIR1,SIG2 
WPITE(3,26)PS1,PS2 



Figure tK-k Routine for Determining Confiden-e Levels 
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Table A-1 



Frequency Distribution for Pseudo- random 
Number Generator (RANDU - IBM 370/155) 
100,000 Samples 
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